我们提出了一种用于分布式培训神经网络模型的新型联合学习方法,其中服务器在每轮中随机选择的设备的子集之间编制协作。我们主要从通信角度查看联合学习问题,并允许更多设备级别计算来节省传输成本。我们指出了一个基本的困境,因为当地 - 设备水平的最低实证损失与全球经验损失的最小值不一致。与最近的事先有关的不同,尝试无所作用的最小化或利用用于并行化梯度计算的设备,我们为每轮的每个设备提出动态规范器,以便在极限中,全局和设备解决方案对齐。我们通过实证结果对真实的和合成数据以及我们的方案在凸和非凸面设置中导致有效培训的分析结果,同时对设备异质性完全不可知,以及大量设备,部分参与和不平衡的数据。
translated by 谷歌翻译
利用稀疏性是加速在移动设备上的量化卷积神经网络(CNN)推断的关键技术。现有稀疏的CNN加速器主要利用无结构性稀疏性并实现显着的加速。然而,由于无界,很大程度上不可预测的稀疏模式,利用非结构化稀疏性需要复杂的硬件设计,具有显着的能量和面积开销,这对能量和区域效率至关重要的移动/ IOT推理场景特别有害。我们建议利用结构化的稀疏性,更具体地,更密集地绑定块(DBB)稀疏性,用于重量和激活。 DBB块张于每个块的最大非零数。因此,DBB暴露静态可预测的稀疏模式,使瘦稀疏性利用硬件能够。我们提出了新的硬件基元,以分别为(静态)权重和(动态)激活的DBB稀疏性,具有非常低的开销。建立在基元的顶部,我们描述了一种基于收缩阵列的CNN加速器的S2TA,可利用联合重量和激活DBB稀疏性和传统的收缩系统阵列上不可用的数据重用的新维度。与具有零值时钟门控的完全阵列的强基线相比,16NM中的S2TA达到超过2倍的加速和能量减少,超过五个流行的CNN基准。与近期的非收缩稀疏加速器相比,Eyeriss V2(65nm)和Sparten(45nm),S2TA在65nm中使用约2.2倍和3.1倍的每次推断的能量较少。
translated by 谷歌翻译
Extracting complex structures from grid-based data is a common key step in automated medical image analysis. The conventional solution to recovering tree-structured geometries typically involves computing the minimal cost path through intermediate representations derived from segmentation masks. However, this methodology has significant limitations in the context of projective imaging of tree-structured 3D anatomical data such as coronary arteries, since there are often overlapping branches in the 2D projection. In this work, we propose a novel approach to predicting tree connectivity structure which reformulates the task as an optimization problem over individual steps of a recursive process. We design and train a two-stage model which leverages the UNet and Transformer architectures and introduces an image-based prompting technique. Our proposed method achieves compelling results on a pair of synthetic datasets, and outperforms a shortest-path baseline.
translated by 谷歌翻译
There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain mapping, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain region the image comes from) is a crucial and open challenge. However, most existing datasets and benchmarks for neuroanatomy consider only a single downstream task at a time. To bridge this gap, we introduce a new dataset, annotations, and multiple downstream tasks that provide diverse ways to readout information about brain structure and architecture from the same image. Our multi-task neuroimaging benchmark (MTNeuro) is built on volumetric, micrometer-resolution X-ray microtomography images spanning a large thalamocortical section of mouse brain, encompassing multiple cortical and subcortical regions. We generated a number of different prediction challenges and evaluated several supervised and self-supervised models for brain-region prediction and pixel-level semantic segmentation of microstructures. Our experiments not only highlight the rich heterogeneity of this dataset, but also provide insights into how self-supervised approaches can be used to learn representations that capture multiple attributes of a single image and perform well on a variety of downstream tasks. Datasets, code, and pre-trained baseline models are provided at: https://mtneuro.github.io/ .
translated by 谷歌翻译
Cohn and Umans proposed a framework for developing fast matrix multiplication algorithms based on the embedding computation in certain groups algebras. In subsequent work with Kleinberg and Szegedy, they connected this to the search for combinatorial objects called strong uniquely solvable puzzles (strong USPs). We begin a systematic computer-aided search for these objects. We develop and implement constraint-based algorithms build on reductions to $\mathrm{SAT}$ and $\mathrm{IP}$ to verify that puzzles are strong USPs, and to search for large strong USPs. We produce tight bounds on the maximum size of a strong USP for width $k \le 5$, construct puzzles of small width that are larger than previous work, and improve the upper bounds on strong USP size for $k \le 12$. Although our work only deals with puzzles of small-constant width, the strong USPs we find imply matrix multiplication algorithms that run in $O(n^\omega)$ time with exponent $\omega \le 2.66$. While our algorithms do not beat the fastest algorithms, our work provides evidence and, perhaps, a path to finding families of strong USPs that imply matrix multiplication algorithms that are more efficient than those currently known.
translated by 谷歌翻译
Agile robotics presents a difficult challenge with robots moving at high speeds requiring precise and low-latency sensing and control. Creating agile motion that accomplishes the task at hand while being safe to execute is a key requirement for agile robots to gain human trust. This requires designing new approaches that are flexible and maintain knowledge over world constraints. In this paper, we consider the problem of building a flexible and adaptive controller for a challenging agile mobile manipulation task of hitting ground strokes on a wheelchair tennis robot. We propose and evaluate an extension to work done on learning striking behaviors using a probabilistic movement primitive (ProMP) framework by (1) demonstrating the safe execution of learned primitives on an agile mobile manipulator setup, and (2) proposing an online primitive refinement procedure that utilizes evaluative feedback from humans on the executed trajectories.
translated by 谷歌翻译
Curating datasets for object segmentation is a difficult task. With the advent of large-scale pre-trained generative models, conditional image generation has been given a significant boost in result quality and ease of use. In this paper, we present a novel method that enables the generation of general foreground-background segmentation models from simple textual descriptions, without requiring segmentation labels. We leverage and explore pre-trained latent diffusion models, to automatically generate weak segmentation masks for concepts and objects. The masks are then used to fine-tune the diffusion model on an inpainting task, which enables fine-grained removal of the object, while at the same time providing a synthetic foreground and background dataset. We demonstrate that using this method beats previous methods in both discriminative and generative performance and closes the gap with fully supervised training while requiring no pixel-wise object labels. We show results on the task of segmenting four different objects (humans, dogs, cars, birds).
translated by 谷歌翻译
Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introducing a demand for AI-based tools that improve the efficiency with which radiologists can comfortably interpret these exams. AI has been shown to improve efficiency in medical-image generation, processing, and interpretation, and a variety of such AI models have been developed across research labs worldwide. However, very few of these, if any, find their way into routine clinical use, a discrepancy that reflects the divide between AI research and successful AI translation. To address the barrier to clinical deployment, we have formed MONAI Consortium, an open-source community which is building standards for AI deployment in healthcare institutions, and developing tools and infrastructure to facilitate their implementation. This report represents several years of weekly discussions and hands-on problem solving experience by groups of industry experts and clinicians in the MONAI Consortium. We identify barriers between AI-model development in research labs and subsequent clinical deployment and propose solutions. Our report provides guidance on processes which take an imaging AI model from development to clinical implementation in a healthcare institution. We discuss various AI integration points in a clinical Radiology workflow. We also present a taxonomy of Radiology AI use-cases. Through this report, we intend to educate the stakeholders in healthcare and AI (AI researchers, radiologists, imaging informaticists, and regulators) about cross-disciplinary challenges and possible solutions.
translated by 谷歌翻译
Automated cellular instance segmentation is a process utilized for accelerating biological research for the past two decades, and recent advancements have produced higher quality results with less effort from the biologist. Most current endeavors focus on completely cutting the researcher out of the picture by generating highly generalized models. However, these models invariably fail when faced with novel data, distributed differently than the ones used for training. Rather than approaching the problem with methods that presume the availability of large amounts of target data and computing power for retraining, in this work we address the even greater challenge of designing an approach that requires minimal amounts of new annotated data as well as training time. We do so by designing specialized contrastive losses that leverage the few annotated samples very efficiently. A large set of results show that 3 to 5 annotations lead to models with accuracy that: 1) significantly mitigate the covariate shift effects; 2) matches or surpasses other adaptation methods; 3) even approaches methods that have been fully retrained on the target distribution. The adaptation training is only a few minutes, paving a path towards a balance between model performance, computing requirements and expert-level annotation needs.
translated by 谷歌翻译
Neural compression offers a domain-agnostic approach to creating codecs for lossy or lossless compression via deep generative models. For sequence compression, however, most deep sequence models have costs that scale with the sequence length rather than the sequence complexity. In this work, we instead treat data sequences as observations from an underlying continuous-time process and learn how to efficiently discretize while retaining information about the full sequence. As a consequence of decoupling sequential information from its temporal discretization, our approach allows for greater compression rates and smaller computational complexity. Moreover, the continuous-time approach naturally allows us to decode at different time intervals. We empirically verify our approach on multiple domains involving compression of video and motion capture sequences, showing that our approaches can automatically achieve reductions in bit rates by learning how to discretize.
translated by 谷歌翻译